AI Safety Gridworlds

نویسندگان

Jan Leike

Miljan Martic

Victoria Krakovna

Pedro A. Ortega

Tom Everitt

Andrew Lefrancq

Laurent Orseau

Shane Legg

چکیده

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gridworlds as Testbeds for Planning with Incomplete Information

Gridworlds are popular testbeds for planning with incomplete information but not much is known about their properties. We study a fundamental planning problem, localization, to investigate whether gridworlds make good testbeds for planning with incomplete information. We find empirically that greedy planning methods that interleave planning and plan execution can localize robots very quickly on...

متن کامل

A Survay of Reinforcement Learning Methods in the Windy and Cliff-walking Gridworlds

This report details the implementation of three Reinforcment learning methods, Monte Carlo, SARSA, and Q-Learning, and compares their performances in the Windy and CliffWalking Gridworlds.

متن کامل

Efficient Incremental Search for Moving Target Search

Incremental search algorithms reuse information from previous searches to speed up the current search and are thus often able to find shortest paths for series of similar search problems faster than by solving each search problem independently from scratch. However, they do poorly on moving target search problems, where both the start and goal cells change over time. In this paper, we thus deve...

متن کامل

Subgoal Graphs for Eight-Neighbor Gridworlds

We propose a method for preprocessing an eightneighbor gridworld to generate a subgoal graph and a method for using this subgoal graph to find shortest paths faster than A*, by first finding high-level paths through subgoals and then shortest low-level paths between consecutive subgoals on the high-level path.

متن کامل

Robust Computer Algebra, Theorem Proving, and Oracle AI

In the context of superintelligent AI systems, the term “oracle” has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1711.09883 شماره

صفحات -

تاریخ انتشار 2017

AI Safety Gridworlds

نویسندگان

چکیده

منابع مشابه

Gridworlds as Testbeds for Planning with Incomplete Information

A Survay of Reinforcement Learning Methods in the Windy and Cliff-walking Gridworlds

Efficient Incremental Search for Moving Target Search

Subgoal Graphs for Eight-Neighbor Gridworlds

Robust Computer Algebra, Theorem Proving, and Oracle AI

عنوان ژورنال:

اشتراک گذاری